Visualization of mosquitoโ€™s populations

We work with information about the location and detection time of two types of mosquitoes. Both Aedes aegypti and Aedes albopictus mosquitoes may spread viruses but Aedes aegypti are more likely to spread these viruses (and therefore are more dangerous).

At first we create a MapBox interface in Plotly to create two dot maps (for years 2004 and 2013) that show the distribution of the two types of mosquitos in the world (use color to distinguish between mosquitos).

Analyze which countries and which regions in these countries had high density of each mosquito type.

In 2004:
We can see high density of Aedes aegypti in:
Brazilโ€”>areas: Fortaleza , Sao Paulo
Venezuela โ€”> area: Caracas
Mexico โ€”> area: Veracruz
Indonesia โ€”> area: Jacarta
United State of America โ€”> area: Houston
Trinidad

And also we can see high density of Aedes albopictus in:
Unites States of America โ€”> areas: Missisipi, Oklahama
Taiwan โ€”> Given that all the counties and citiesโ€™ names were written in Chinese, we were unable to discern between them.

In 2013: We can see high density of Aedes aegypti in:
Brazil โ€”> Most areas of the brazil affected by this virus.

And also we can see high density of Aedes albopictus in:
Taiwan โ€”> Given that all the counties and citiesโ€™ names were written in Chinese, we were unable to discern between them.

More countries were impacted by these two mosquito species in 2004, but as of 2013, we can see that only Brazil and Taiwan are affected, despite the fact that their densities are higher in these two nations.

  1. Overplotting: In some high-density areas, overplotting can occur. And it makes it difficult to distinguish individual data (such as the names of areas which are affected) and can lead to a loss of information. For instance, Tiwan in 2004 and 2013, also Brazil in 2013.

  2. Data Density: Data points may be numerous in some areas and sparse in others, depending on the dataset and map scale. The viewerโ€™s perception of the areas with concentrated mosquito populations may be impacted by this non-uniform distribution. For instance, in 2004, we could not distinguish Taiwan as a high-density affected area, but after zooming a lot, we understood that this area was seriously affected.


In this task, we compute Z as the numbers of mosquitos per country detected during all study period. Use plot_geo() function to create a choropleth map that shows Z values. This map should have an Equirectangular projection.
The Equirectangular projection, also known as the Plate Carrรฉe projection, is one of the simplest and most commonly used map projections in cartography and geographic information systems (GIS). It maps meridians to equally spaced vertical straight lines, and circles of latitude to evenly spread horizontal straight lines.ย 

The Equirectangular projection is not ideal for navigation or large-scale mapping because of the significant distortion of shapes and sizes as you move away from the equator. Other map projections, such as the Mercator projection or Lambert Conformal Conic projection, are better suited for specific applications where preservation of area or distance is important.

Because the total number of mosquitos for some countries are very small, and for a few countries are large. Therefore, most of the countries show in the light red and this plot indicates so little information.
In addition, given that Taiwan, as a country, has been most affected by these mosquitoes, is tiny compared to a global scale space (Size aesthetics used), also we have overplotting in this area, at first glance and without continuous zooming, this level of density is not visible at all.
By comparing the values of Z in Tiwan and Brazil, we understood that Z is 24837 for Tiwan and 8501 for Brazil, although we only see Brazil as a high-density area that is colored in dark red.


Now, we should create the same kind of maps as in step 2 but use:

Equirectangular projection with choropleth color log (๐‘)


Conic equal area projection with choropleth color log (๐‘)

Analyze the map

This map is more informative than the previous one, because color mapping is better and we can distinguish areas which are infected more easier. Although larger regions with the same color looks dominating. And it is hard to identify small regions that are seriously infected (For instance: Taiwan).
This map shows unusual regions better, also here we can find clusters of regions that are similar easier.

In Equirectangular projection the latitude and longitude lines are evenly spaced, making it suitable for choropleth maps where data is distributed evenly across latitudes and longitudes. It is straightforward and easy to understand, making it accessible for a wide range of users.
In these types of projection preserves shape and angles along the equator, which can be beneficial when working with data concentrated in that region (Minimal Distortion near Equator). But as you move away from the equator, distortion becomes significant, causing areas to appear stretched vertically. This makes it less suitable for displaying data near the poles.
Choropleth maps with data concentrated at high latitudes may not be effectively represented using the equirectangular projection.

Conic equal area is less distortion in limited area. It minimizes distortion within a particular region of interest, making it suitable for maps focused on a specific part of the world. Conic equal area projections are best suited for specific regions; they may not be appropriate for displaying global data or data distributed over a wide range of latitudes.
Conic projections can be more complex to understand and work with compared to the equirectangular projection, requiring more advanced cartographic skills.
For instance, we can see Taiwan area in Conic equal area obviously, although it was not seen Equirectangular projection easily. Therefore, it could affect our results and decisions.


Part 4

In this part, in order to resolve problems detected in step 1, use data from 2013 only for Brazil and following below steps:

Create variable X1 by cutting X into 100 piecies (use cut_interval())


Create variable Y1 by cutting Y into 100 piecies (use cut_interval())


Compute mean values of X and Y per group (X1,Y1) and the amount of observations N per group (X1,Y1)

## # A tibble: 6 ร— 5
## # Groups:   X1 [5]
##   X1            Y1            mean_X mean_Y     N
##   <fct>         <fct>          <dbl>  <dbl> <int>
## 1 [-72.8,-72.4] (-8.21,-7.84]  -72.8  -7.96     1
## 2 (-71.2,-70.8] (-8.94,-8.57]  -70.8  -8.9      1
## 3 (-70.4,-70]   (-10.8,-10.4]  -70.0 -10.7      1
## 4 (-70,-69.6]   (-4.14,-3.77]  -69.7  -4.03     1
## 5 (-69.6,-69.2] (-10.8,-10.4]  -69.2 -10.7      1
## 6 (-69.6,-69.2] (-10.1,-9.68]  -69.4  -9.77     1

Visualize mean X,Y and N by using MapBox

Identify regions in Brazil that are most infected by mosquitoes. Did such discretization help in analyzing the distribution of mosquitoes?

The most infected regions in Brazil in 2013 by mosquitos include:
Caicara: N=16
Guarabira: N=15
Brejinho: N=14
Sao Paulo: N=13
Sitio Cruz de Alma: N=13
Mogeiro:13

By discretizing the geographic space into smaller regions we aggregate the data within each cell, making it easier to analyze patterns and variations in mosquito distribution. In addition, discretization allows us to identify patterns in mosquito distribution at a regional level. We can see which grid cells have higher or lower mosquito populations. This information can be valuable for understanding where mosquitoes are concentrated and where they are less prevalent.

Also discretization helps in analyzing the distribution of mosquitoes by simplifying spatial data, revealing regional patterns, enabling statistical analysis, and facilitating effective visual communication of findings. It provides a structured approach to studying mosquito populations in different regions, helping researchers and decision-makers better understand and manage mosquito-related issues.

In summary, with such discretization, we can see the distribution of mosquitoes quickly. And we can identify regions that are most infected by mosquitoes more easily and analyze their distribution more effectively.


Apendix

knitr::opts_chunk$set(
    echo = TRUE,
    message = FALSE,
    warning = FALSE
)
hr {
  border: 1px solid black; /* Adjust the thickness and color as needed */
}

#Visualization of mosquitoโ€™s populations
data <- read.csv("aegypti_albopictus.csv")
library(dplyr)
library(ggplot2)
#library(MASS)

library(plotly)
# Part 1
#library(dplyr)
#mapbox
#Create dot map for year=2004
Sys.setenv('MAPBOX_TOKEN' = "pk.eyJ1IjoibW9ubW81MzgiLCJhIjoiY2xtbnZ3MDRsMHkwNjJrcG5jaW1raXVybyJ9.HxbeuV14U8LSJL_nG5fGiw")

data1 <- data%>%filter(YEAR=="2004")
p<-plot_mapbox(data1)%>%
  add_trace(type="scattermapbox",lat=~Y, lon=~X,
                               color=~VECTOR, text=~COUNTRY)
p <- p %>% layout(
  title = "Mosquito Populations in 2004",
  mapbox=list(
    style="open-street-map")
)
p


#Create dot map for year=2013
Sys.setenv('MAPBOX_TOKEN' = "pk.eyJ1IjoibW9ubW81MzgiLCJhIjoiY2xtbnZ3MDRsMHkwNjJrcG5jaW1raXVybyJ9.HxbeuV14U8LSJL_nG5fGiw")

data2 <- data%>%filter(YEAR=="2013")
p<-plot_mapbox(data2)%>%
  add_trace(type="scattermapbox",lat=~Y, lon=~X,
                                  color=~VECTOR, text=~COUNTRY)
p <- p %>% layout(
  title = "Mosquito Populations in 2013",
  mapbox=list(
    style="open-street-map")
)
p


# Part 2
library(dplyr)
Z <- data%>%
  group_by(COUNTRY)%>%
  #n() function specifically counts the number of rows (observations) within each group
  summarise(Total_num_Mosquitos = n())

# Merge the total mosquito counts (Z) with the original data using COUNTRY as the key
merged_data <- merge(data, Z, by = "COUNTRY")


# Create the choropleth map
library(plotly)

# WORLD MAP
g <- list(
  projection = list(type = 'equirectangular')  # Use equirectangular projection
)

p <- plot_geo(merged_data) %>%
  add_trace(
    z = ~Total_num_Mosquitos, color = ~Total_num_Mosquitos, colors = 'Reds',
    text = ~COUNTRY, locations = ~COUNTRY_ID
  ) %>%
  layout(
    title = "Numbers of Mosquitos per Country",
    geo = g
  )

p


#Create the choropleth map: Equirectangular projection with choropleth color log(๐‘)
library(plotly)

# WORLD MAP
g <- list(
  projection = list(type = 'equirectangular')  # Use equirectangular projection
)

p <- plot_geo(merged_data) %>%
  add_trace(
    z = ~log(Total_num_Mosquitos), color = ~log(Total_num_Mosquitos), colors = 'Reds',
    text = ~COUNTRY, locations = ~COUNTRY_ID
  ) %>%
  layout(
    title = "Numbers of Mosquitos per Country_Equirectangular projection with choropleth color log (๐‘)",
    geo = g
  )

p



#Create the choropleth map: Conic equal area projection with choropleth color log(๐‘)
library(plotly)

# WORLD MAP
g <- list(
  projection = list(type = 'conic equal area')  # Use conic equal area projection
)

p <- plot_geo(merged_data) %>%
  add_trace(
    z = ~log(Total_num_Mosquitos), color = ~log(Total_num_Mosquitos), colors = 'Reds',
    text = ~COUNTRY, locations = ~COUNTRY_ID
  ) %>%
  layout(
    title = "Numbers of Mosquitos per Country_Conic equal area projection with choropleth color log (๐‘)",
    geo = g
  )

p


#Filtering data from 2013 only for Brazil
data3 <- data%>%filter(YEAR=="2013" & COUNTRY=="Brazil")

# Create variable X1 by cutting X into 100 piecies
data3$X1 <- cut_interval(data3$X, n = 100)


#Create variable Y1 by cutting Y into 100 piecies
data3$Y1 <- cut_interval(data3$Y, n = 100)

# Part 4-c
#Compute mean values of X and Y per group (X1,Y1) and the amount of observations
#N per group (X1,Y1)
mean_X_Y <- data3%>%
  group_by(X1,Y1)%>%
  summarize(mean_X = mean(X),
            mean_Y = mean(Y), 
            N=n())

head(mean_X_Y)


#Visualize mean X,Y and N by using MapBox

Sys.setenv('MAPBOX_TOKEN' = "pk.eyJ1IjoibW9ubW81MzgiLCJhIjoiY2xtbnZ3MDRsMHkwNjJrcG5jaW1raXVybyJ9.HxbeuV14U8LSJL_nG5fGiw")

p <- plot_mapbox(mean_X_Y)%>%add_trace(type="scattermapbox",                                                       lat=~mean_X_Y$mean_Y,
                                       lon=~mean_X_Y$mean_X,
                                       mode = "markers",
                               text=~paste("Mean_X =", mean_X_Y$mean_X,
                                           "Mean_Y = ", mean_X_Y$mean_Y,
                                           "N = ", mean_X_Y$N),
                               hoverinfo = "text",
                               marker=list(size = mean_X_Y$N))

p <- p %>% layout(
  title = "mean X,Y and N",
  mapbox=list(
    style="open-street-map")
)
p


# p <- plot_mapbox(x=mean_X_Y$mean_X, Y= mean_X_Y$mean_Y, color=~mean_X_Y$N)
# p